1. Introduction¶

1.1. Overview¶

Welcome to this comprehensive notebook, designed as part of a mini-challenge that delves into the fascinating realms of signal and image processing. This notebook serves as a platform for exploring and applying various data science and machine learning techniques to real-world scenarios. Our journey encompasses a series of tasks, each focusing on different aspects of signal and image analysis, leveraging a diverse and multifaceted dataset.

1.2. Objective¶

The primary goal of this notebook is to tackle a series of carefully crafted challenges that span across three key areas:

  1. ECG Signal Analysis: Investigating the intricate patterns within electrocardiogram (ECG) signals using auto-correlation and cross-correlation methods. This task aims to unravel the periodic nature of heart rhythms, offering insights essential for medical diagnostics.

  2. Image Segmentation and Analysis: Focusing on the segmentation of rooftops from aerial images. By employing color-based segmentation techniques in the HSV color space, complemented by morphological operations, this task aims to distinguish complex features within the images, presenting potential applications in urban planning and architectural studies.

  3. Feature Descriptors in Images: Utilizing the Scale-Invariant Feature Transform (SIFT) algorithm for keypoint matching in variously transformed images of a specific subject, Mirage Prime from the game Warframe. This task explores the robustness and adaptability of the SIFT algorithm under varying conditions like lighting, orientation, and perspective changes.

1.3. Approach and Dataset Diversity¶

In approaching these tasks, we employ a diverse dataset that includes ECG signals and high-resolution images. The ECG data provides a rich source for analyzing cardiac signals, while the high-quality images of urban landscapes and digitally rendered characters offer a broad spectrum for our image processing tasks. This diversity not only enriches the analysis but also challenges our methods to be adaptable and robust across different scenarios.

In summary, this notebook is more than just a collection of tasks; it's a narrative of exploration, learning, and innovation in the realms of data science and machine learning, providing valuable insights into the practical applications and future potential of these fields.

DISCLAIMER ChatGPT was used to complete this notebook, for one to reflect on the task, help with code, provide ideas and ensure that the pretty printing is done properly.

Table of contents

    1. Introduction
    • 1.1. Overview
    • 1.2. Objective
    • 1.3. Approach and Dataset Diversity
    1. Correlation in signals
    • 2.1. Problem
      • 2.1.1. Dataset
      • 2.1.2. Approach
        • 2.1.2.1. Metrics and Tools
      • 2.1.3. Revised Signal Processing Steps
      • 2.1.4. Explanation of the Plots
      • 2.1.5. Importance of Signal Representation
    • 2.2. Auto-Correlogram of the ECG Signal
    • 2.3. Cross-Correlation of the ECG Signal with Segements
    • 2.4. Cross-Correlation of the ECG Signal with Modified Segments
      • 2.4.1. Noise
      • 2.4.2. Scaling
      • 2.4.3. Inversion
    • 2.5. Final Conclusion
    1. Segmentation, morphological operations, object properties in images
    • 3.1. Problem
      • 3.1.1. Dataset
      • 3.1.2. Approach
        • 3.1.2.1. Objective
        • 3.1.2.2. Methodology
        • 3.1.2.3. Metrics and Visualization
    • 3.2. Color-based Segmentation in HSV Color Space
    • 3.3. Morphological Opening to Refine the Mask
    • 3.4. Filling Holes
    • 3.5. Removing Small Objects
    • 3.6. Closing Operation to Refine Mask
    • 3.7. Fill Holes Again
    • 3.8. Skeletonization for Minimal Representation
    • 3.9. Labeling and Object Property Analysis
      • 3.9.1. Analysis of Rooftop Properties for Solar Panel Placement
        • 3.9.1.1. Area
        • 3.9.1.2. Perimeter
        • 3.9.1.3. Eccentricity
      • 3.9.2. Consideration of Brightness for Solar Panel Placement
    • 3.10. Final Conclusion
    1. feature descriptors in images (LE4): keypoint matching
    • 4.1. SIFT (Scale-Invariant Feature Transform)
      • 4.1.1. What is SIFT?
      • 4.1.2. How SIFT Works
      • 4.1.3. Capabilities of SIFT
      • 4.1.4. Limitations
    • 4.2. Data Description: Warframe Images of Mirage Prime
      • 4.2.1. Image Description
    • 4.3. Tests Breakdown
      • 4.3.1. Perspective Variation Test
      • 4.3.2. Lighting Variation Test
      • 4.3.3. Special Effects Test
      • 4.3.4. Distance Variation Test
      • 4.3.5. Horizontal Flip Test
      • 4.3.6. Rotational Robustness Test
      • 4.3.7. Optimizating SIFT Parameters
        • 4.3.7.1. Definition of Good Matches
    • 4.4. Keypoint Matching Tests
      • 4.4.1. Perspective Variation Test
        • 4.4.1.1. Objective
        • 4.4.1.2. Expected Result
        • 4.4.1.3. Analysis
      • 4.4.2. Lighting Variation Test - Bright
        • 4.4.2.1. Objective
        • 4.4.2.2. Expected Result
        • 4.4.2.3. Analysis
      • 4.4.3. Lighting Variation Test - Dimmed
        • 4.4.3.1. Objective
        • 4.4.3.2. Expected Result
        • 4.4.3.3. Analysis
      • 4.4.4. Special Effects Test
        • 4.4.4.1. Objective
        • 4.4.4.2. Expected Result
        • 4.4.4.3. Analysis
      • 4.4.5. Distance Variation Test
        • 4.4.5.1. Objective
        • 4.4.5.2. Expected Result
        • 4.4.5.3. Analysis
      • 4.4.6. Horizontal Flip Test
        • 4.4.6.1. Objective
        • 4.4.6.2. Expected Result
        • 4.4.6.3. Analysis
      • 4.4.7. Rotational Robustness Test
        • 4.4.7.1. Objective
        • 4.4.7.2. Expected Result
        • 4.4.7.3. Analysis
      • 4.4.8. Final Conclusion
In [35]:
# matplotlib retina display
%config InlineBackend.figure_format='retina'

import wfdb
import numpy as np
import matplotlib.pyplot as plt

from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import ccf
import cv2
from skimage import io 

from skimage.morphology import remove_small_objects, dilation
from scipy.ndimage import binary_fill_holes
from skimage.morphology import square, skeletonize
from skimage.measure import label, regionprops

from matplotlib.patches import Patch

import time

2. Correlation in signals¶

Find 1 signal that contains recurring patterns. Then use auto-correlation to analyze the recurring patterns within your signal. Can the periodicity of your pattern be visualized via auto-correlogram? Discuss your choice of method and parameters as well as the results in approx. 150 words. Now cut out a piece of your signal and try to find it in the original signal using cross-correlation. How do you recognize that the position fits? Describe it in 1-2 sentences. Now change your cut-out piece slightly and see if it can still be found via cross-correlation. What types of changes are tolerated? Which ones are not? Discuss the results in approx. 150 words.

Attention: this is an open task. Set your own limits or come to the consultation if you are not clear about the limits. Apt choice of data, to-the-point creativity, moderate diversity and critical thinking are required.

2.1. Problem¶

Electrocardiogram (ECG) Analysis for Pattern Recognition

Electrocardiography is a non-invasive method widely used to assess the electrical and muscular functions of the heart. While ECG signals are crucial for diagnosing cardiovascular diseases, their analysis often requires the identification of recurring patterns, such as the QRS complex, P waves, and T waves. However, ECG signals can vary significantly between individuals, making it challenging to develop a universal approach for pattern recognition and analysis. In this section, we aim to tackle this challenge by employing signal processing techniques, particularly focusing on auto-correlation and cross-correlation methods, to analyze and identify recurring patterns in ECG signals.

2.1.1. Dataset¶

ECG-ID Database

The ECG-ID database is used for this analysis, containing 100 ECG signals from different individuals. Each signal spans 10 seconds and has a sampling rate of 1000 Hz, stored in a binary format. The variability in these signals makes the dataset ideal for studying recurring patterns in ECG data.

Source: PhysioNet - ECG-ID Database

This dataset is particularly suitable for our study as cross-correlation is frequently utilized in ECG signal processing for detecting specific components like the QRS complex.

2.1.2. Approach¶

We will employ the following signal processing techniques:

  1. Auto-Correlation: To identify the periodic nature of patterns within an ECG signal.
  2. Cross-Correlation: To locate and analyze specific segments within the ECG signal, including segments with added noise, scaling, time-shifting, and inversion.

The approach is designed to evaluate the robustness and sensitivity of correlation techniques in signal analysis.

2.1.2.1. Metrics and Tools¶

  1. Correlation Coefficients: These will quantify the degree of similarity between segments and the original signal.
  2. Visual Analysis: Plots of the ECG signals and their corresponding correlation graphs will be used for qualitative assessment.

By combining quantitative metrics with qualitative visual analysis, we aim to achieve a comprehensive understanding of the signal patterns and their detectability through correlation methods.

In [36]:
# Path to the ECG file (excluding the extension which wfdb handles)
ecg_file_path = 'data/ecg-id-database-1.0.0/Person_08/rec_1'

# Reading the record
record = wfdb.rdrecord(ecg_file_path)

# Extracting signals and annotations
ecg_signals = record.p_signal
annotations = wfdb.rdann(ecg_file_path, 'atr')

plt.figure(figsize=(12, 6))
for i, channel in enumerate(ecg_signals.T):
    plt.subplot(len(ecg_signals.T), 1, i+1)
    plt.plot(channel)
    plt.title(f'ECG Channel {i+1}')
    plt.xlabel('Sample')
    plt.ylabel('Amplitude (mV)')
    plt.xticks(np.arange(0, len(channel)+1, 1000))
plt.tight_layout()
plt.show()
No description has been provided for this image

Here we observe the ECG signals from two separate channels. These channels represent different leads capturing the electrical activity of the heart. The distinct peaks and troughs correspond to various phases of the cardiac cycle, with the taller spikes likely representing the QRS complex, which is indicative of ventricular depolarization.

2.1.3. Revised Signal Processing Steps¶

The updated code performs the following essential steps:

  1. Signal Acquisition: The wfdb library reads the ECG signal from the rec_1 record, which includes both the .dat and .hea files to obtain signal and metadata.
  2. Signal Visualization: The ECG signal for each channel is plotted to provide a visual representation of the heart's electrical activity, which is essential for identifying patterns and anomalies.

2.1.4. Explanation of the Plots¶

The provided plots offer insights into the cardiac activity captured by the ECG:

  • ECG Channel 1 Plot: Reveals the raw ECG tracing, which may include the P wave, QRS complex, and T wave, all of which are critical for diagnosing cardiac conditions.
  • ECG Channel 2 Plot: Provides a similar view from a different lead, which can be used to cross-verify the features observed in Channel 1.

2.1.5. Importance of Signal Representation¶

While the focus of the current exercise is on pattern recognition via correlation techniques, it's essential to represent the ECG signal in a manner consistent with medical standards for several reasons:

  • Clinical Relevance: Accurate representation ensures that any patterns detected are clinically relevant and comparable to standard ECG readings.
  • Analysis Integrity: Proper scaling and baseline correction preserve the integrity of the signal, ensuring that the patterns we analyze are true to the physiological events they represent.
  • Versatility for Further Analysis: Although amplitude values may not be critical for correlation tasks, a correctly scaled signal allows for a broader range of analyses, which might be pertinent in a different context.

By maintaining these standards, we ensure that the signal is appropriately prepared for both the correlation analysis and any potential future analyses that may require precise amplitude values.

2.2. Auto-Correlogram of the ECG Signal¶

In [37]:
ecg_signal = record.p_signal[:,1]

ecg_signal_subset = ecg_signal[:1300] 


fig, axs = plt.subplots(2, 1, figsize=(12, 9))

# Plot the original signal
axs[0].plot(ecg_signal_subset)
axs[0].set_title('Original Signal (First 1300 Samples)')
axs[0].set_xlabel('Sample')
axs[0].set_ylabel('Amplitude (mV)')
axs[0].set_xticks(np.arange(0, len(ecg_signal_subset)+1, 50))

# Auto-correlation
plot_acf(ecg_signal_subset, lags=len(ecg_signal_subset)-1, ax=axs[1], adjusted=True)
axs[1].set_title('Auto-Correlogram')
axs[1].set_xlabel('Lag')
axs[1].set_ylabel('Correlation')
axs[1].set_xticks(np.arange(0, len(ecg_signal_subset)+1, 50))


plt.tight_layout()
plt.show()
No description has been provided for this image

In the top plot showcasing the first 1300 samples of the ECG signal, we observe a clear periodicity, with a cycle roughly every 350 samples, indicative of the heart's rhythmic beating. The accompanying auto-correlogram reinforces this observation, as it highlights regular peaks at intervals of approximately 350 samples. These peaks represent the signal's self-similarity at those intervals, confirming the periodic nature of the cardiac cycle. The correlogram serves as a mathematical confirmation of the visually apparent cycles and is particularly useful for identifying consistent patterns in complex physiological signals like ECG, where manual inspection might be challenging. The analysis hints at the underlying regularity of heartbeats and possibly other physiological rhythms, such as breathing, although such secondary patterns are not as distinctly represented in the auto-correlogram we would need to look at a bigger sample, which makes the plot no longer readable.

In [38]:
def perform_time_lag_cross_correlation(signal, segment):
    # Time Lag Correlation (ccf) backward 
    ccf_correlation = ccf(segment, signal, adjusted=False)
    lags_ccf = np.arange(len(ccf_correlation))

    ccf_correlation = ccf_correlation / np.max(ccf_correlation)

    # ChatGPT and https://scicoding.com/cross-correlation-in-python-3-popular-packages/
    # Notice that, similarly to the SciPy implementation, we needed to remove the padding. Also, Statsmodels provides the cross-correlation response in reversed order with respect to the other schemes, which is why we needed to flip the result.
    return lags_ccf, ccf_correlation[0:(len(signal)+1)][::-1]


def plot_segment_and_time_lag_correlation(ecg_signal, segment, segment_start, title_suffix):
    segment_end = segment_start + len(segment)
    full_length_segment = np.zeros_like(ecg_signal)
    full_length_segment[segment_start:segment_end] = segment

    fig, axs = plt.subplots(3, 1, figsize=(15, 15))
    
    axs[0].plot(ecg_signal, label='Original Signal')
    axs[0].set_title("Original ECG Signal")
    axs[0].set_xlabel("Sample Index")
    axs[0].set_ylabel("Signal Amplitude")
    axs[0].legend()
    axs[0].set_xticks(np.arange(0, len(ecg_signal)+1, 500))

    # Store the y-axis limits of the original signal plot
    y_limits = axs[0].get_ylim()

    axs[1].plot(full_length_segment, label='Segmented Signal', color='orange')
    axs[1].set_title(f"Segmented Signal Aligned with Original (Segment {title_suffix})")
    axs[1].set_xlabel("Sample Index")
    axs[1].set_ylabel("Signal Amplitude")
    axs[1].legend()
    # Setting the y-axis limits to be the same as the original signal plot
    axs[1].set_ylim(y_limits)
    axs[1].set_xticks(np.arange(0, len(ecg_signal)+1, 500))

    # Performing time lag cross-correlation
    lags_ccf, ccf_correlation = perform_time_lag_cross_correlation(ecg_signal, segment)

    # Plotting time lag correlation
    axs[2].plot(ccf_correlation, color='green')
    axs[2].set_title(f'Time Lag Correlation for Segment {title_suffix}\n Peak Correlation Lag {np.argmax(ccf_correlation)} \n Peak Correlation Score {np.max(ccf_correlation):.2f}\n Lowest Correlation Lag {np.argmin(ccf_correlation)} \n Lowest Correlation Score {np.min(ccf_correlation):.2f}')
    axs[2].set_xlabel('Lags')
    axs[2].set_ylabel('Correlation')
    axs[2].set_xticks(np.arange(0, len(ecg_signal)+1, 500))
    axs[2].set_ylim([-1, 1])
    
    # set vertical line at lag with highest correlations 
    axs[2].axvline(x=np.argmax(ccf_correlation), color='red', linestyle='--')
    # set vertical line for the lowest correlation
    axs[2].axvline(x=np.argmin(ccf_correlation), color='blue', linestyle='--')

    plt.tight_layout()
    plt.show()


def add_noise(segment, noise_level):
    noise = np.random.normal(0, noise_level, segment.shape)
    return segment + noise

def scale_amplitude(segment, scale_factor):
    return segment * scale_factor

def time_shift(segment, shift):
    return np.roll(segment, shift)

def invert_signal(segment):
    return -segment


ecg_signal_subset = ecg_signal[:10000]

segment_1 = ecg_signal[0:350]
segment_2 = ecg_signal[0:1000]

2.3. Cross-Correlation of the ECG Signal with Segements¶

In [39]:
plot_segment_and_time_lag_correlation(ecg_signal_subset, segment_1, 0, "0-350")
No description has been provided for this image

As we previously saw there was a periodic pattern at about every 350 samples. Segmenting this section from the signal and applying cross-correlation shows that this holds up, as the peak correlation is at lag 349. This means that the segment is most similar to the original signal when it is shifted by 350 samples.

In [40]:
plot_segment_and_time_lag_correlation(ecg_signal_subset, segment_2, 0, "0-1250")
No description has been provided for this image

Looking at a bigger segment we can see a smiliar results, once again the peak aligns with chosen segement. However compared to the 350 sample segment, we see that the correlation is still frequent but has more ups and downs. This is because the segment is longer and therefore has more points where the correlation is low, we can however again identify the periodicity of the signal.

2.4. Cross-Correlation of the ECG Signal with Modified Segments¶

Now after we have seen that the cross-correlation works well with the original signal we are now taking a closer look at the segments. We will modify the segments by adding noise, scaling, time-shifting and inverting them. We will then apply cross-correlation to see how these modifications affect the correlation.

2.4.1. Noise¶

In [41]:
# Extracting a segment and adding noise
noisy_segment = add_noise(segment_1, noise_level=0.3)

# Plotting the original and noisy segments
plot_segment_and_time_lag_correlation(ecg_signal_subset, noisy_segment, 0, "Noisy 0-350")
No description has been provided for this image

After adding noise to the segment we can see that the correlation is still very high, however the peak correlation is now at lag 350, so shifted by 1. We can however see how the noise affects the cross-correlation as it increases between the relevant correlation spots. The periodicity is still visible but not as clear as before. Adding too much noise would eventually result in a correlation that is not periodic anymore.

2.4.2. Scaling¶

In [42]:
# Extracting a segment and adding scaling
scaled_segment = scale_amplitude(segment_1, scale_factor=20)

# Plotting the original and noisy segments
plot_segment_and_time_lag_correlation(ecg_signal_subset, scaled_segment, 0, "Scaled 0-350")
No description has been provided for this image

Scaling seems to have not as much of an effect on the correlation as noise. The correlation is still very high and the periodicity is still visible. The peak correlation is at lag 349, so the same as the original segment. This shows that scaling the amplitude of the segment does not affect the correlation.

In [43]:
# Extracting a segment and adding time shift
shifted_segment = time_shift(segment_1, shift=100)

plot_segment_and_time_lag_correlation(ecg_signal_subset, shifted_segment, 0, "Time Shifted 0-350")
No description has been provided for this image

Shifting the signal by 100 has the expected effect of shifting the correlation by 100. As the segment is shifted by 100, the correlation is highest at lag 249 which is exactly 100 off the initial max correlation lag. The periodicity is clearly visible and the correlation is still high just shifted.

2.4.3. Inversion¶

In [44]:
# Extracting a segment and inverting
inverted_segment = invert_signal(segment_1)

plot_segment_and_time_lag_correlation(ecg_signal_subset, inverted_segment, 0, "Inverted 0-350")
No description has been provided for this image

Inverting the signal has a huge impact on the correlation. The correlation is now negative, however the periodicity is still visible. The new highest correlation is at lag 6557, however we can also see that the lowest correlation is again at 349 with a correlation of -3.29 which is just a fragment of the inversion process of the segment correlating with the original signal.

2.5. Final Conclusion¶

The auto-correlation analysis of the ECG signal revealed a clear periodicity, indicative of the heart's rhythmic beating and potential respiratory influences.

Subsequent cross-correlation confirmed the recurring patterns, with modifications like noise addition and signal shifting slightly altering the peaks but still preserving the periodicity. Notably, noise introduced variability, whereas scaling the amplitude had a negligible effect on correlation quality. Time shifts translated directly to peak displacements in the correlogram, demonstrating the method's resilience to such changes. Inversion, however, resulted in a negative correlation, highlighting the method's sensitivity to signal polarity. These findings underscore the robustness of correlation techniques in detecting recurring cardiac patterns despite minor signal alterations. The analysis demonstrates the potential of these methods for ECG analysis, emphasizing their applicability in varied clinical scenarios while also revealing limitations, particularly in distinguishing between physiological signals and noise.

Looking at bigger segments could introduce even more periodicity, like breathing, which can already be slightly noticed by looking at the peaks, however this would require a bigger sample size to prove properly.

3. Segmentation, morphological operations, object properties in images¶

Find 1 image that contains several similar objects. These objects are to be segmented using suitable methods. The results are to be saved as labeled images (binary or 1 label per class). Show how you have improved the label masks using morphological operations. For each operation used, explain in 1-2 sentences why you are using this operation. Also show the intermediate results of your applied operations in individual images and that you have only changed the objects minimally (e.g. no shifts, reductions or enlargements). At the end, extract your individual objects, count and measure 2-3 properties of your extracted objects using suitable methods. Explain in 1-2 sentences for each property why you chose it and whether the results are useful.

Then create a skeleton of one of your objects that is as minimal but representative as possible and output the number of pixels of the skeleton.

Discuss your findings and results in approx. 150 words.

3.1. Problem¶

In urban planning and geographical analysis, accurately segmenting rooftops from aerial images is vital. This segmentation assists in solar panel placement by looking at rooftops and their properties further helping potential placements and planning in the future. This analysis is crucial for understanding the scale of buildings in the area and for identifying potential locations for solar panels.

We will work with an image capturing an aerial view of a region, focusing on identifying rooftops. The challenge is to accurately segment these rooftops, from the rest of the image. Our goal is to refine the segmentation so each rooftop is distinctly isolated, clearly differentiating the buildings from their surroundings.

3.1.1. Dataset¶

For the second exercise a picture of an aerial view of a city (Qingdao, Shandong, China) is used, shot by Clark Ma on a DJI, FC3582. The picture was taken from a drone and shows a city with many houses all with red rooftops. The picture is stored in png format.

source: https://unsplash.com/photos/an-aerial-view-of-a-city-with-red-roofs-LUq0rbuUk6o

This picture was chosen because it contains many similar objects (houses with red rooftops) which is ideal for segmentation. This type of segmentation is also used in satellite imagery, which makes this image a good example for this exercise.

3.1.2. Approach¶

3.1.2.1. Objective¶

The primary objective of this exercise is to segment rooftops from an aerial image. This involves identifying and isolating rooftops using color-based segmentation techniques, refining the segmentation with morphological operations, and analyzing the properties of the extracted objects.

3.1.2.2. Methodology¶

Our approach includes the following steps:

  1. Color-Based Segmentation in HSV Color Space: Using the HSV color model to differentiate rooftops based on their unique color hues.
  2. Morphological Operations: Applying techniques such as opening, closing, filling holes, and removing small objects to refine the segmentation.
  3. Labeling and Analyzing Rooftop Properties: Measuring various properties of the segmented rooftops, such as area, to understand their characteristics.
  4. Skeletonization: Creating a minimal representation of a selected rooftop for structural analysis.

Each step enhances the accuracy of rooftop segmentation, ensuring a clear distinction between rooftops and their environment.

3.1.2.3. Metrics and Visualization¶

We will visually inspect the results at each significant step to assess the effectiveness of our techniques. This inspection includes:

  • Examining the results of color-based segmentation.
  • Assessing the impact of each morphological operation.
  • Analyzing the properties of segmented rooftops.
  • Evaluating the skeletonized representation for minimalism and structural clarity.

Through these visualizations and analyses, we aim to demonstrate the effectiveness of our methods in isolating rooftops from the aerial image.

In [45]:
original_image = io.imread('data/Roofs_Satellite.jpg')

gray_image = cv2.cvtColor(original_image, cv2.COLOR_RGB2GRAY)

plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.imshow(original_image)
plt.title('Original Image')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(gray_image, cmap='gray')
plt.title('Grayscale Image')
plt.axis('off')

plt.show()
No description has been provided for this image

3.2. Color-based Segmentation in HSV Color Space¶

The HSV color space effectively separates color (hue) from brightness (value), aiding in isolating specific colors. Here, we segment rooftops by targeting red and brown hues, common colors for rooftops in satellite imagery. This initial step is crucial as it forms the foundation for our subsequent rooftop isolation.

In [46]:
# Convert to HSV color space
hsv_image = cv2.cvtColor(original_image, cv2.COLOR_RGB2HSV)

# Define color ranges for red and brown
lower_red1 = np.array([0, 50, 20])
upper_red1 = np.array([10, 255, 255])
lower_brown = np.array([10, 50, 20])
upper_brown = np.array([15, 255, 255])
lower_red2 = np.array([160, 50, 20])
upper_red2 = np.array([180, 255, 255])

mask1 = cv2.inRange(hsv_image, lower_red1, upper_red1)
mask2 = cv2.inRange(hsv_image, lower_brown, upper_brown)
mask3 = cv2.inRange(hsv_image, lower_red2, upper_red2)

# Combine masks to capture red and brown shades
combined_mask = mask1 + mask2 + mask3

# Plot the initial combined mask
plt.figure(figsize=(6, 6))
plt.imshow(combined_mask, cmap='gray')
plt.title('Initial Combined Mask')
plt.axis('off')
plt.show()
No description has been provided for this image

3.3. Morphological Opening to Refine the Mask¶

Morphological opening (erosion followed by dilation) is applied to smooth the edges of larger objects and remove small noise. This refinement is vital as it enhances the quality of our segmentation, focusing on significant objects while reducing the impact of small artifacts.

In [47]:
# Perform morphological operations to refine the mask
kernel = np.ones((3, 3), np.uint8)
refined_mask = cv2.morphologyEx(combined_mask, cv2.MORPH_OPEN, kernel)

# Plot the refined mask
plt.figure(figsize=(6, 6))
plt.imshow(refined_mask, cmap='gray')
plt.title('Refined Mask after Opening')
plt.axis('off')
plt.show()
No description has been provided for this image

3.4. Filling Holes¶

After opening, small holes in the mask can lead to incomplete rooftop shapes. Using binary_fill_holes fills these gaps, ensuring complete rooftop shapes.

In [48]:
# Fill holes using binary_fill_holes
filled_mask = binary_fill_holes(refined_mask).astype(np.uint8)

# plot the filled mask
plt.figure(figsize=(6, 6))
plt.imshow(filled_mask, cmap='gray')
plt.title('Mask after Filling Holes')
plt.axis('off')
plt.show()
No description has been provided for this image

3.5. Removing Small Objects¶

Following this we can see that the roofs are clean but there is still a lot of fragmented noise in the image. We can remove this noise by removing small objects from the mask. This is done by using the remove_small_objects function from scikit-image. This function removes all connected components (objects) smaller than the specified size.

In [49]:
# Remove small objects from the filled mask
min_size = 6000
cleaned_mask_bool = remove_small_objects(filled_mask > 0, min_size=min_size)

# Convert boolean mask back to uint8 for visualization and further processing
cleaned_mask = (cleaned_mask_bool * 255).astype(np.uint8)

# Label and analyze properties of the closed mask
labeled_mask = label(cleaned_mask)
properties = regionprops(labeled_mask)

# Count of objects before and after filling holes
objects_before_filling = np.max(label(refined_mask))
objects_after_cleaning = len(properties)
print(f"Objects before filling holes: {objects_before_filling}")
print(f"Objects after all cleaning operations: {objects_after_cleaning}")

# Plot the mask after filling holes and removing small objects
plt.figure(figsize=(6, 6))
plt.imshow(cleaned_mask, cmap='gray')
plt.title('Mask after Removing Small Objects')
plt.axis('off')
plt.show()
Objects before filling holes: 23412
Objects after all cleaning operations: 68
No description has been provided for this image

3.6. Closing Operation to Refine Mask¶

After all of this our segmentation is already in a good state. We were able to get the number of objects from 23412 to 84. However, we can further refine it by applying a closing operation.

Closing (dilation followed by erosion) further refines our mask. It's essential for closing any small holes or gaps within the rooftops, ensuring that the rooftops are represented as solid, contiguous shapes. This is just to ensure that the labels in the end are not split up (e.g. if there is a small gap between two rooftops, they should still be labeled as one object ideally).

In [50]:
# Close small holes on the roofs with a larger kernel
large_kernel = np.ones((5, 5), np.uint8)
closed_mask = cv2.morphologyEx(cleaned_mask, cv2.MORPH_CLOSE, large_kernel, iterations=5)

# Plot the mask after the closing operation
plt.figure(figsize=(6, 6))
plt.imshow(closed_mask, cmap='gray')
plt.title('Mask after Closing Operation')
plt.axis('off')
plt.show()
No description has been provided for this image

3.7. Fill Holes Again¶

After the closing operation, we can see that the rooftops are now solid shapes. However, there are still some small holes in the rooftops. We can fill these holes again using the binary_fill_holes function from scikit-image.

In [51]:
# fill holes again 
filled_mask_2 = binary_fill_holes(closed_mask).astype(np.uint8)

# Label and analyze properties of the closed mask
labeled_mask = label(filled_mask_2)
properties = regionprops(labeled_mask)

print(f"Count of objects: {len(properties)}")


# plot the filled mask
plt.figure(figsize=(6, 6))
plt.imshow(filled_mask_2, cmap='gray')
plt.title('Mask after Filling Holes')
plt.axis('off')
plt.show()
Count of objects: 46
No description has been provided for this image

This is the final mask that we will use for further analysis. We can see that the rooftops are now clearly separated from the background, the final count of objects is 46.

3.8. Skeletonization for Minimal Representation¶

Skeletonization reduces a rooftop to its basic structure, minimizing the pixels used to represent it while preserving its shape and connectivity. This process is essential for shape analysis and feature extraction. It helps us understand the structural characteristics of the rooftops in a simplified form.

In [52]:
areas = [prop.area for prop in properties]

# Find the largest area index
largest_area_index = np.argmax(areas)

# Skeletonize the largest area
skeleton = skeletonize(labeled_mask == largest_area_index + 1)

# Skeletonize all areas
skeleton_all = skeletonize(labeled_mask > 0)

# Dilate the skeletons for visibility
thick_skeleton_all = dilation(skeleton_all, square(10))
thick_skeleton = dilation(skeleton, square(10))

plt.figure(figsize=(8, 6))
ax = plt.gca()

# Initialize a color image with the same dimensions, all black
color_image = np.zeros((thick_skeleton_all.shape[0], thick_skeleton_all.shape[1], 3), dtype=np.uint8)

# Assign the thick skeleton of all rooftops to white
color_image[thick_skeleton_all == 1] = [255, 255, 255]

# Overlay the largest rooftop in red
color_image[thick_skeleton == 1] = [255, 0, 0]

plt.imshow(color_image)

# Create a legend and position it outside the plot
legend_elements = [Patch(facecolor='white', edgecolor='black', label='All Rooftops'),
                   Patch(facecolor='red', edgecolor='black', label='Largest Rooftop')]
plt.legend(handles=legend_elements, loc='upper left', bbox_to_anchor=(1, 1))

plt.title(f'Skeleton of all rooftops\npixel count of largest rooftop: {np.sum(skeleton)} pixels\npixel count of all rooftops: {np.sum(skeleton_all)} pixels')
plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [53]:
# Overlay bounding boxes on the original image
overlay_with_bboxes = original_image.copy()
for prop in properties:
    minr, minc, maxr, maxc = prop.bbox
    cv2.rectangle(overlay_with_bboxes, (minc, minr), (maxc, maxr), (0, 255, 0), 3)
In [54]:
# Overlay the skeleton on the original image
skeleton_overlay = original_image.copy()

# change the color of the skeleton to blue to make it more visible
skeleton_overlay[thick_skeleton_all] = [0, 0, 255] 

titles = ['Original Image', 'Combined Mask', 'Refined Mask', 'Filled Mask',
          'Cleaned Mask', 'Closed Mask', 'Refilled Mask', 'All Skeletons', 'Skeleton Overlay', 'Bounding Boxes']
images = [original_image, combined_mask, refined_mask, filled_mask,
          cleaned_mask, closed_mask, filled_mask_2 , thick_skeleton_all, skeleton_overlay, overlay_with_bboxes]

plt.figure(figsize=(18, 14))
for i in range(len(titles)):
    plt.subplot(2, 5, i + 1)
    if titles[i] == 'Skeleton Overlay':
        plt.imshow(images[i])
    else:
        plt.imshow(images[i], cmap='gray' if i > 1 else None)
    plt.title(titles[i])
    plt.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

This final plot showcases the full step by step process of our segmentation. We can see that the rooftops are now clearly separated from the background and that the segmentation is very accurate. The skeletonization also worked well and we can see that the largest rooftop is represented by a minimal number of pixels. The bounding boxes also show that the segmentation is very accurate and that the rooftops are clearly separated from each other.

In [55]:
# Define the final mask as the result of the last operations in the user's code
final_mask = filled_mask_2

# Apply the final mask to the original image to create a segmented image
segmented_image = cv2.bitwise_and(original_image, original_image, mask=final_mask)

# Display the original and segmented images
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.imshow(original_image)
plt.title('Original Image')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(segmented_image)
plt.title('Segmented Image')
plt.axis('off')

plt.show()
No description has been provided for this image

3.9. Labeling and Object Property Analysis¶

Labeling distinct segmented rooftops enables us to analyze and measure properties like area. This analysis provides insights into the size distribution of the rooftops, crucial for understanding the scale of buildings in the area.

In [56]:
# Assuming 'properties' is a list of properties for each region
properties = regionprops(labeled_mask)

areas = [prop.area for prop in properties]
perimeters = [prop.perimeter for prop in properties]
eccentricities = [prop.eccentricity for prop in properties]

metrics = {
    'Area': (np.mean(areas), np.median(areas)),
    'Perimeter': (np.mean(perimeters), np.median(perimeters)),
    'Eccentricity': (np.mean(eccentricities), np.median(eccentricities))
}

def plot_histogram(data, title, x_label, color, ax):
    mean_val, median_val = metrics[title]
    ax.hist(data, bins=20, color=color, alpha=0.7, edgecolor='black')
    ax.set_title(f'Histogram of Rooftop {title}\nTotal Rooftops: {len(data)}')
    ax.set_xlabel(x_label)
    ax.set_ylabel('Frequency')
    ax.axvline(mean_val, color='orange', linestyle='dashed', linewidth=2, label=f'Mean: {mean_val:.2f}')
    ax.axvline(median_val, color='red', linestyle='dashed', linewidth=2, label=f'Median: {median_val:.2f}')
    ax.legend()
    ax.grid(True)

fig, axs = plt.subplots(1, 3, figsize=(18, 6))

plot_histogram(areas, 'Area', 'Area (pixels)', 'blue', axs[0])
plot_histogram(perimeters, 'Perimeter', 'Perimeter (pixels)', 'green', axs[1])
plot_histogram(eccentricities, 'Eccentricity', 'Eccentricity', 'red', axs[2])

plt.tight_layout()
plt.show()
No description has been provided for this image
In [57]:
largest_properties = sorted(properties, key=lambda x: x.area, reverse=True)[:5]

fig, axs = plt.subplots(1, 5, figsize=(20, 4))
for i, prop in enumerate(largest_properties):
    minr, minc, maxr, maxc = prop.bbox
    
    # Mask and crop the rooftop
    rooftop_mask = labeled_mask == prop.label
    cropped_mask = rooftop_mask[minr:maxr, minc:maxc]
    rooftop_image = original_image[minr:maxr, minc:maxc]
    masked_rooftop = np.where(cropped_mask[..., None], rooftop_image, 0)

    area = prop.area
    perimeter = prop.perimeter
    eccentricity = prop.eccentricity
    stats_title = f'Rooftop {i+1} \nArea: {area}\n Perimeter: {perimeter:.2f}\n Eccentricity: {eccentricity:.2f}'

    axs[i].imshow(masked_rooftop)
    axs[i].set_title(stats_title)
    axs[i].axis('off')

plt.show()
No description has been provided for this image

3.9.1. Analysis of Rooftop Properties for Solar Panel Placement¶

3.9.1.1. Area¶

The Area of a rooftop is a fundamental factor when assessing its suitability for solar panel installation. Larger areas provide more space for solar panels, which can increase the potential energy yield. It is essential to consider the total usable area, as some parts of the roof may be unsuitable for solar panels due to obstructions or non-ideal angles relative to the sun's path.

3.9.1.2. Perimeter¶

The Perimeter of a rooftop gives insights into its shape complexity. A simpler, more rectangular perimeter typically indicates a more straightforward installation process and potentially more efficient panel arrangement. Complex perimeters may require custom solutions, which can increase costs and reduce the area available for solar panels.

3.9.1.3. Eccentricity¶

Eccentricity measures the elongation of the rooftop. A lower eccentricity suggests a more circular shape, which may not be optimal for panel placement. In contrast, moderate eccentricity values closer to those of a rectangle could indicate a rooftop that is well-suited for solar panels, as these shapes are more compatible with the standard panel dimensions and can facilitate a more efficient layout.

3.9.2. Consideration of Brightness for Solar Panel Placement¶

The Brightness of a rooftop in an image can give a preliminary indication of its sun exposure at the moment the photograph was taken. However, because a single image is a snapshot in time, it doesn't provide a complete picture of the sun's movement across the sky at different times of the day and year. Hence, while brightness could be a useful metric, it was not analyzed in this instance due to the limitations of the data.

For a thorough assessment, a temporal analysis would be required, tracking how sunlight and shadows interact with the rooftops throughout the day and across seasons. This additional property analysis could significantly enhance the understanding of potential solar panel placement by identifying areas that receive consistent sunlight, which is crucial for optimizing solar energy capture.

3.10. Final Conclusion¶

In this segmentation task, we effectively isolated rooftops from an aerial image using HSV color-based methods, refined by morphological operations. Opening smoothed boundaries, and closing with hole-filling preserved rooftop integrity. The methods minimized object alteration, ensuring shapes and sizes remained true to their original forms. We measured areas, perimeters, and eccentricities, yielding valuable data on building dimensions and styles. However, the segmentation faced challenges: one street was erroneously included due to its color similarity to rooftops, and two roofs with greyish solar installations were not cleanly segmented, highlighting the method's sensitivity to color variations. Incorporating grey shades would have confounded the segmentation, as streets share similar hues (I really tried...). The skeletonization of a rooftop provided a minimal structural representation. Despite these limitations, the analysis offers significant insights for urban planning, though it underscores the need for more nuanced color differentiation in segmentation processes. The end product, while not perfect, provides a valuable foundation for further analysis and planning on how to approach the problem of solar panel placement.

4. feature descriptors in images (LE4): keypoint matching¶

Select a few images with the same subject/object that show the object from different angles, from a different perspective, from a different distance or rotated. Then apply the keypoint descriptor {'SIFT'} assigned to you to "match" at least two of your images via keypoints. Choose suitable parameters and a suitable number of keypoints. Explain your decisions in 1-2 sentences. Show your results visually and make 2-3 observations. Discuss in approx. 150 words how robust the algorithm {'SIFT'} assigned to you is with regard to transformations or different recording conditions (light, ...) and its computational efficiency. Show at least one of these properties using your example data. Discuss the results and findings in 2-3 sentences.

Please note: this is an open task. Set yourself a framework for the limitation or come to the consultation if the limits are not clear to you. Apt choice of data, to-the-point creativity, moderate diversity and critical thinking are required.

4.1. SIFT (Scale-Invariant Feature Transform)¶

4.1.1. What is SIFT?¶

Scale-Invariant Feature Transform (SIFT) is a widely used algorithm in computer vision for detecting and describing local features in images. SIFT is particularly celebrated for its ability to identify distinctive features (or keypoints) in images that are invariant to scaling, rotation, and partially invariant to changes in illumination and 3D camera viewpoint.

4.1.2. How SIFT Works¶

SIFT operates in several key stages (general idea):

  1. Scale-Space Extrema Detection: SIFT starts by searching for extrema (maximum and minimum) points in the scale space (a series of images scaled differently). It uses the Difference of Gaussians (DoG) approach which involves the subtraction of two blurred images at different scales.

  2. Keypoint Localization: Potential keypoints are refined to get more accurate results. This step removes low-contrast keypoints and edges, enhancing the stability of these points.

  3. Orientation Assignment: Each keypoint is assigned one or more orientations based on local image gradient directions. This step ensures that the keypoint descriptor is rotation invariant.

  4. Keypoint Descriptor: The local gradient data around each keypoint (in its scale space) is transformed into a descriptor. The descriptor is highly distinctive and partially invariant to the remaining variations, such as lighting or 3D viewpoint.

  5. Keypoint Matching: In the final stage, keypoints between different images are matched by comparing their descriptors using some distance measure (like Euclidean distance)

4.1.3. Capabilities of SIFT¶

  • Robustness: SIFT features are robust to changes in image scale, noise, illumination, and rotation.
  • Distinctiveness: The descriptors provide a unique fingerprint to match different instances of the same object across many images.
  • Wide Applications: SIFT can be used in various applications like object recognition, robotic mapping and navigation, 3D modeling, gesture recognition, and video tracking.

4.1.4. Limitations¶

While SIFT is a powerful tool, it has some limitations. It can be computationally intensive, especially for high-resolution images, and may not perform well in conditions of extreme blurring or occlusion. Despite these, SIFT remains a foundational technique in the field of computer vision.

In the following exercise we will be testing the capabilities of SIFT by applying it to a series of images of the same object under different conditions.

4.2. Data Description: Warframe Images of Mirage Prime¶

The dataset for this exercise comprises a series of high-quality images of Mirage Prime, a character from the popular video game Warframe. These images were personally created using the game's Captura Mode, a feature that allows players to capture cinematic in-game photographs. This mode provides the flexibility to experiment with various lighting conditions, camera angles, and compositions, making it an ideal tool for generating a diverse set of images.

More info:

  • https://www.warframe.com/
  • https://warframe.fandom.com/wiki/Mirage/Prime
  • https://warframe.fandom.com/wiki/Captura

4.2.1. Image Description¶

  • Main Image: The original image from which other transformations are derived, serving as the baseline for comparisons.
  • Angled: The main image is captured from a different angle to assess the descriptor's performance with varying perspectives.
  • Bright: Brightness of the main image is increased to evaluate the descriptor's sensitivity to changes in illumination.
  • Dimmed: The main image with reduced brightness, further testing the descriptor's robustness to lighting conditions.
  • Lightning: An image with special lightning effect that obscure key features, challenging the descriptor's performance.
  • Far: The main image captured from a further distance, testing the scale invariance of the keypoint descriptor.
  • Flipped: The main image is flipped horizontally, to check the descriptor's invariance to mirror transformations.
  • Rotated: The main image is rotated by a significant angle to examine the rotational invariance of the descriptor.
In [58]:
def adjust_brightness(image, brightness_offset):
    """
    Adjusts the brightness of an image.
    """
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(hsv)
    v = np.clip(np.int16(v) + brightness_offset, 0, 255).astype(np.uint8)
    final_hsv = cv2.merge((h, s, v))
    return cv2.cvtColor(final_hsv, cv2.COLOR_HSV2BGR)


def flip_image(image, flip_code):
    """
    Flips an image.
    """
    return cv2.flip(image, flip_code)

def rotate_image(image, angle):
    """
    Rotates an image.
    """
    height, width = image.shape[:2]
    rotation_matrix = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1)
    return cv2.warpAffine(image, rotation_matrix, (width, height))

def crop_center(img, crop_width, crop_height):
    """
    Crops the center of the image.
    """
    center_x, center_y = img.shape[1] // 2, img.shape[0] // 2
    start_x = max(center_x - crop_width // 2, 0)
    start_y = max(center_y - crop_height // 2, 0)
    end_x = start_x + crop_width
    end_y = start_y + crop_height
    return img[start_y:end_y, start_x:end_x]

def load_image(path):
    """
    Loads an image from a file path.
    """
    img = cv2.imread(path, cv2.IMREAD_COLOR)
    if img is None:
        raise FileNotFoundError(f"Failed to load image from {path}")
    return img

def plot_images(images, titles, figsize=(15, 10)):
    """
    Plots a list of images.
    """
    plt.figure(figsize=figsize)
    for i, (img, title) in enumerate(zip(images, titles)):
        plt.subplot(2, 4, i + 1)
        plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        plt.title(title)
        plt.axis('off')
    plt.tight_layout()
    plt.show()

crop_size = (850, 750)

# Load images
main_image_path = 'data/Warframe/main.jpg'
main_image = load_image(main_image_path)
main_image = crop_center(main_image, *crop_size)

bright_image = adjust_brightness(main_image, 100)
dimmed_image = adjust_brightness(main_image, -100)

flip_image = flip_image(main_image, 1)
rotate_image = rotate_image(main_image, 70)

angled_image = load_image('data/Warframe/angled1.jpg')
angled_image = crop_center(angled_image, *crop_size)

lightning_image = load_image('data/Warframe/lightning.jpg')
lightning_image = crop_center(lightning_image, *crop_size)

far_image = load_image('data/Warframe/far.jpg')
far_image = crop_center(far_image, *crop_size)

# Crop all images
cropped_images = [main_image, angled_image, bright_image, dimmed_image, lightning_image, far_image, flip_image, rotate_image]

titles = ['Main', 'Angled', 'Bright', 'Dimmed', 'Lightning', 'Far', 'Flipped', 'Rotated']

plot_images(cropped_images, titles)
No description has been provided for this image

4.3. Tests Breakdown¶

4.3.1. Perspective Variation Test¶

  • Objective: To verify if SIFT maintains keypoint matching accuracy when the perspective of the subject changes.
  • Setup: The baseline 'main.jpg' will be matched against 'angled.jpg' to assess performance when the angle of view is altered.

4.3.2. Lighting Variation Test¶

  • Objective: To assess SIFT's robustness to changes in lighting conditions.
  • Setup: The 'main.jpg' will be matched against 'bright.jpg' and 'dimmed.jpg' to evaluate keypoint matching under both increased and decreased illumination.

4.3.3. Special Effects Test¶

  • Objective: To evaluate the impact of visual effects on SIFT's keypoint matching capability.
  • Setup: The 'main.jpg' will be compared to 'lightning.jpg', testing how well keypoints can be matched when the image includes special lighting effects.

4.3.4. Distance Variation Test¶

  • Objective: To confirm SIFT's scale invariance by matching keypoints of the subject at varying distances.
  • Setup: The 'main.jpg' will be matched against 'far.jpg', simulating a change in the distance from which the subject is viewed.

4.3.5. Horizontal Flip Test¶

  • Objective: To test SIFT's ability to handle mirror transformations of the subject.
  • Setup: The 'main.jpg' will be matched against 'flipped.jpg', checking for keypoint matching when the image is horizontally flipped.

4.3.6. Rotational Robustness Test¶

  • Objective: To determine the rotational invariance of SIFT keypoint matching.
  • Setup: The 'main.jpg' will be matched against 'rotated.jpg', assessing the matching performance when the subject is rotated by a significant angle.

4.3.7. Optimizating SIFT Parameters¶

Before we begin with the tests, we need to find the optimal parameters for SIFT.

The SIFT algorithm has several parameters that can be tuned to improve performance. The parameters I will optimize for are:

  • nfeatures: The number of best features to retain. The higher the number, the more keypoints are detected. However, this can lead to a higher number of false positives. - contrastThreshold: The contrast threshold used to filter out weak keypoints. The higher the threshold, the more keypoints are filtered out. Barely had any impact on my results, so I decided to ignore this parameter.
  • edgeThreshold: The threshold used to filter out edge-like keypoints. The higher the threshold, the more keypoints are filtered out.
  • sigma: The sigma of the Gaussian applied to the input image at the octave #0. If your image is captured with a higher resolution, you may need to increase this value to detect more keypoints.

In order to optimize these parameters, we will look at the development of the average number of good matches for each parameter. The average number of good matches is calculated by matching the main image with all test images and then calculating the average number of good matches. On top of that we are also looking at the computational efficiency of the algorithm by measuring the average time it takes to match the main image with all test images.

4.3.7.1. Definition of Good Matches¶

The SIFT (Scale-Invariant Feature Transform) code identifies good matches between keypoints in two images by employing the following criteria:

  1. KeyPoint Detection: It first detects keypoints and computes their descriptors in both images. These descriptors encapsulate the local features around each keypoint.

  2. Matching Strategy: The code utilizes the Brute Force Matcher with cv2.NORM_L2 (Euclidean distance) to find the two nearest neighbors for each descriptor in one image among the descriptors in the other image.

  3. Ratio Test: For each pair of nearest neighbors (two closest matches), it applies Lowe's ratio test. A match is considered good if the distance of the closest match is less than 75% (default ratio) of the distance of the second-closest match. This ratio test rejects matches that are not significantly closer than the nearest incorrect match, reducing false positives.

  4. Sorting Matches: The good matches are then sorted by their distance, with the assumption that lower distances correspond to better matches.

When plot_matches is set to True, the draw_matches method visualizes the best matches (default is top 10) on the combined images to qualitatively assess the matching quality. The number of good matches and their respective distances can be used to optimize the SIFT parameters for specific image scenarios, as demonstrated in the KeypointMatcherOptimizer class.

In [59]:
class KeypointMatcher:
    def __init__(self, sift_params=None):
        if sift_params is None:
            sift_params = {'nfeatures': 1200, 'contrastThreshold': 0.03, 'edgeThreshold': 10, 'sigma': 1.2}
        self.sift = cv2.SIFT_create(**sift_params)

    def ensure_grayscale(self, image):
        if len(image.shape) == 3:
            return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        return image

    def detect_and_compute(self, image):
        gray_image = self.ensure_grayscale(image)
        keypoints, descriptors = self.sift.detectAndCompute(gray_image, None)
        return keypoints, descriptors

    def match_keypoints(self, descriptors1, descriptors2, ratio_test=0.75):
        bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
        matches = bf.knnMatch(descriptors1, descriptors2, k=2)
        good_matches = [m for m, n in matches if m.distance < ratio_test * n.distance]
        return sorted(good_matches, key=lambda x: x.distance)

    def draw_matches(self, img1, kp1, img2, kp2, matches):
        matched_image = cv2.drawMatches(img1, kp1, img2, kp2, matches, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
        return cv2.cvtColor(matched_image, cv2.COLOR_BGR2RGB)

    def run_test(self, main_image, test_image, test_name, plot_matches=True, n_matches=10):
        kp_main, des_main = self.detect_and_compute(main_image)
        kp_test, des_test = self.detect_and_compute(test_image)
        good_matches = self.match_keypoints(des_main, des_test)

        if plot_matches:
            matched_image = self.draw_matches(main_image, kp_main, test_image, kp_test, good_matches[:n_matches])
            plt.figure(figsize=(12, 6))
            plt.imshow(matched_image)
            plt.axis('off')
            plt.title(f'{test_name} - Matched Keypoints\n Good Matches: {len(good_matches)} (Showing the best {n_matches})')
            plt.show()
        else:
            return good_matches


class KeypointMatcherOptimizer:
    def __init__(self, main_image, test_images):
        self.main_image = main_image
        self.test_images = test_images

    def run_range_optimization(self, parameter_ranges):
        parameter_scores = {param: [] for param in parameter_ranges}

        for param, (start, end, step) in parameter_ranges.items():
            for value in np.arange(start, end, step):
                sift_params = {p: value if p == param else next(iter(parameter_ranges[p])) for p in parameter_ranges}
                score = self.test_sift_params(sift_params)
                parameter_scores[param].append((value, score))

        return parameter_scores

    def test_sift_params(self, sift_params):
        matcher = KeypointMatcher(sift_params)
        total_good_matches = 0
        total_time_taken = 0

        for test_image in self.test_images:
            start_time = time.time()
            good_matches = matcher.run_test(self.main_image, test_image, test_name="", plot_matches=False)
            end_time = time.time()

            total_good_matches += len(good_matches)
            total_time_taken += end_time - start_time

        avg_good_matches = total_good_matches / len(self.test_images)
        avg_time_taken = total_time_taken / len(self.test_images)
        return avg_good_matches, avg_time_taken

# we are optimizing for the more problematic images as the other ones seem to be fine no matter what parameters we use
test_images = [angled_image, far_image, lightning_image]

# Define parameter ranges for optimization, contrastThreshold will be ignored as it had no impact on the results.
# I tested with bigger ranges but these ranges are representative of my thought process in decision making and take less time to run.
parameter_ranges = {
    'nfeatures': (500, 10000, 500),
    'edgeThreshold': (5, 50, 5), 
    'sigma': (1.0, 8.0, 0.2)
}

# Run optimization
optimizer = KeypointMatcherOptimizer(main_image, test_images)
parameter_scores = optimizer.run_range_optimization(parameter_ranges)

for param, scores in parameter_scores.items():
    values, matches_time_pairs = zip(*scores)
    matches, times = zip(*matches_time_pairs)

    fig, ax1 = plt.subplots()

    ax1.plot(values, matches, 'b-o')
    ax1.set_xlabel(param)
    ax1.set_ylabel('Average Good Matches', color='b')
    ax1.tick_params('y', colors='b')

    ax2 = ax1.twinx()
    ax2.plot(values, times, 'r-o')
    ax2.set_ylabel('Average Time Taken (seconds)', color='r')
    ax2.tick_params('y', colors='r')
    ax2.set_ylim(0, 1)

    plt.title(f'Impact of {param} on Performance and Efficiency')
    plt.grid(True)
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Taking a look at these plots allows us to make some decisions about the parameters. We can see that nfeatures don't seem to improve the results after 5000. The edgeThreshold also stagnates after 25. Sigma seems to be a bit more complicated, but values around 2.0 seem to be the better.

Concerning efficiency we can generally see that the computational impact of nfeatures and edgeThreshold do not linearly increase with the value, they seem to be rather constant. Sigma on the other hand seems to have a linear impact on the computational efficiency.

In [60]:
# print best parameter values to be sure.
print(parameter_scores['nfeatures'][9][0])
print(parameter_scores['edgeThreshold'][4][0])
print(parameter_scores['sigma'][6][0])
5000
25
2.1999999999999997

After some playing around, I decided to use the following parameters for the tests:

In [61]:
sift_params = {
    'nfeatures': 5000,
    'edgeThreshold': 25,
    'sigma': 2.2
}

matcher = KeypointMatcher(sift_params)

4.4. Keypoint Matching Tests¶

In this section, we perform keypoint matching tests using the SIFT algorithm. Each test compares the main image against a transformed version to evaluate the robustness of the keypoint matching process under various image transformations.

4.4.1. Perspective Variation Test¶

In [62]:
matcher.run_test(main_image, angled_image, 'Main vs Angled', n_matches=20)
No description has been provided for this image

4.4.1.1. Objective¶

To verify if SIFT maintains keypoint matching accuracy when the perspective of the subject changes.

4.4.1.2. Expected Result¶

Given SIFT's capability for perspective invariance, it is expected to match keypoints effectively despite changes in the viewing angle.

4.4.1.3. Analysis¶

Taking a look at the matched keypoints, we can see that SIFT does struggle with identifying keypoints when the perspective of the subject changes. Given Mirage's symmetrical design, SIFT struggled differentiating between left and right side of the face, as there a few matches which flip from left to the right side, while this showcases SIFT's ability to identify keypoints despite perspective changes, it also shows that it is not perfect and lacks some higher alignment capabilities.

Nonetheless, it was able to match a few keypoints well together, namely on on the mask left side anc center.

Overall the results are not as great as expected as the angle is not that extreme.

4.4.2. Lighting Variation Test - Bright¶

In [63]:
matcher.run_test(main_image, bright_image, 'Main vs Bright', n_matches=20)
No description has been provided for this image

4.4.2.1. Objective¶

To assess SIFT's robustness to increased illumination conditions.

4.4.2.2. Expected Result¶

SIFT is designed to be invariant to illumination changes, so it should match keypoints effectively under increased brightness.

4.4.2.3. Analysis¶

Here we can clearly see that brightness does not hinder SIFT's keypoint matching, if it is just a change in brightness. As seen by the identified good matches of over 1000, SIFT has no problem identifying keypoints in the brightened image.

4.4.3. Lighting Variation Test - Dimmed¶

In [64]:
matcher.run_test(main_image, dimmed_image, 'Main vs Dimmed', n_matches=20)
No description has been provided for this image

4.4.3.1. Objective¶

To assess SIFT's robustness to decreased illumination conditions.

4.4.3.2. Expected Result¶

SIFT should maintain accurate keypoint matching even when the lighting is reduced, due to its illumination invariance.

4.4.3.3. Analysis¶

Just like with the brightened image, SIFT was able to match keypoints effectively despite the reduced illumination. This is shown by the identified good matches of over 900. Interesting is to see how the keypoints slightly differ between dimmed and brightened image, as the keypoints matched for the dimmed image are on the brighter parts of the image (eyes right side), wheras it is the opposite for the brightened image (matching the dimmer areas with higher confidence, like mask). This highlights the importance of the descriptor in the keypoint matching process, as it is able to identify keypoints despite changes in illumination, but overall still prefers balanced illumination.

4.4.4. Special Effects Test¶

In [65]:
matcher.run_test(main_image, lightning_image, 'Main vs Lightning', n_matches=20)
No description has been provided for this image

4.4.4.1. Objective¶

To evaluate the impact of visual effects, such as lightning, on SIFT's keypoint matching capability.

4.4.4.2. Expected Result¶

The SIFT algorithm is expected to identify keypoints reliably, even with the presence of visual effects that may obscure parts of the image.

4.4.4.3. Analysis¶

Observing the matched keypoints, we can see that SIFT was able to match keypoints effectively despite the presence of the lightning effect, obscuring parts of Mirage's face. However with 55 good matches identified, it is clear that the lightning effect did hinder the keypoint matching process. Even the quality of good matches is not perfect. Here again we can see that the symmetry of the face is a problem for SIFT, as it struggles to differentiate between left and right side of the face. The crystal in the center of the forehead is matched left to right side, which is not completely wrong, but not correct either.

4.4.5. Distance Variation Test¶

In [66]:
matcher.run_test(main_image, far_image, 'Main vs Far', n_matches=20)
No description has been provided for this image

4.4.5.1. Objective¶

To confirm SIFT's scale invariance by matching keypoints when the subject appears smaller, as if viewed from a farther distance.

4.4.5.2. Expected Result¶

The SIFT descriptors should be able to match keypoints on the subject despite changes in apparent size due to distance variations.

4.4.5.3. Analysis¶

Taking a look at the matched keypoints, that SIFT did extremely poorly here. Only very few points around neck and collar are matched correctly. Other than that it seems to be extremely overwhelmed by the distance variation.

4.4.6. Horizontal Flip Test¶

In [67]:
matcher.run_test(main_image, flip_image, 'Main vs Flipped', n_matches=20)
No description has been provided for this image

4.4.6.1. Objective¶

To test SIFT's ability to match keypoints when the image is subject to a horizontal flip.

4.4.6.2. Expected Result¶

SIFT's descriptors are expected to be invariant to mirror-image transformations and thus should match keypoints in the flipped image correctly.

4.4.6.3. Analysis¶

The horizontal flip did not hinder SIFT's keypoint matching, demonstrating its effectiveness in handling such transformations. While not as easy to find many good matches as with for example the dimmed version with 202 good matches it seems to be able to match keypoints well. All keypoints matched are also correct.

4.4.7. Rotational Robustness Test¶

In [68]:
matcher.run_test(main_image, rotate_image, 'Main vs Rotated', n_matches=20)
No description has been provided for this image

4.4.7.1. Objective¶

To determine the rotational invariance of SIFT keypoint matching by testing with a significantly rotated image.

4.4.7.2. Expected Result¶

Given SIFT's design, it should accurately match keypoints regardless of image rotation.

4.4.7.3. Analysis¶

The final test shows something rather interesting: Rotating the image by 70 degrees did not have an impact on the keypoint matching process. SIFT was able to match keypoints effectively despite the significant rotation. This is shown by the 1000 good matches identified. This is a very good result and shows that SIFT is very robust to rotational changes. again all matches are correct.

4.4.8. Final Conclusion¶

The SIFT algorithm demonstrates commendable robustness across various image transformations. It consistently identified keypoints in images with altered perspectives and lighting conditions, indicating strong invariance to these factors. Specifically, SIFT performed reliably in matching keypoints between images with increased or decreased brightness, confirming its robustness to illumination changes. Furthermore, its ability to handle horizontal flips and significant rotations underscores its orientation invariance. However, the algorithm showed limitations under severe perspective changes, such as when the subject was viewed from a far distance or different angle, resulting in a small number of good matches, which were not that good.

In terms of computational efficiency, while not the fastest due to its detailed and complex nature, SIFT's performance remains within acceptable bounds for real-world applications, at least in my use case. The optimization process revealed that beyond certain parameter thresholds, improvements in match quality plateau, suggesting that a balance between accuracy and speed can be achieved by fine-tuning parameters such as nfeatures, edgeThreshold, and sigma. sigma appears to b the one which has the biggest effect on computational efficiency, as it has a linear impact on the computational efficiency, whereas the others seem to have a rather constant impact. However these tests were all concluded on my MacBook Pro M1 2021 and might not be representative for other systems.